We consider distributed learning in the presence of slow and unresponsive worker nodes, referred to as stragglers. In order to mitigate the effect of stragglers, gradient coding redundantly assigns partial computations to the worker such that the overall result can be recovered from only the non-straggling workers. Gradient codes are designed to tolerate a fixed number of stragglers. Since the number of stragglers in practice is random and unknown a priori, tolerating a fixed number of stragglers can yield a sub-optimal computation load and can result in higher latency. We propose a gradient coding scheme that can tolerate a flexible number of stragglers by carefully concatenating gradient codes for different straggler tolerance. By proper task scheduling and small additional signaling, our scheme adapts the computation load of the workers to the actual number of stragglers. We analyze the latency of our proposed scheme and show that it has a significantly lower latency than gradient codes.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
To date, the comparison of Statistical Shape Models (SSMs) is often solely performance-based and carried out by means of simplistic metrics such as compactness, generalization, or specificity. Any similarities or differences between the actual shape spaces can neither be visualized nor quantified. In this paper, we present a first method to compare two SSMs in dense correspondence by computing approximate intersection spaces and set-theoretic differences between the affine vector spaces spanned by the models. To this end, we approximate the distribution of shapes lying in the intersection space using Markov Chain Monte Carlo, and then apply Principal Component Analysis (PCA) to its samples. By representing the resulting spaces again as an SSM, our method enables an easy and intuitive analysis of similarities between two model's shape spaces. We estimate differences between SSMs in a similar manner; here, however, the resulting shape spaces are not linear vector spaces anymore and we do not apply PCA but instead use the posterior samples for visualization. We showcase the proposed algorithm qualitatively by computing and analyzing intersection spaces and differences between publicly available face models focusing on gender-specific male and female as well as identity and expression models. Our quantitative evaluation based on SSMs built from synthetic and real-world data sets provides detailed evidence that the introduced method is able to recover ground-truth intersection spaces and differences. Finally, we demonstrate that the proposed algorithm can be easily adapted to also compute intersections and differences between color spaces.
translated by 谷歌翻译
我们考虑分布式SGD问题,其中主节点在$ n $工人之间分配梯度计算。通过将任务分配给所有工人,只等待$ k $最快的工人,主节点可以随着算法的发展而逐渐增加$ k $,可以权衡算法的错误。但是,这种策略被称为自适应$ k $ -sync,忽略了未使用的计算的成本和向揭示出散布行为的工人进行交流模型的成本。我们提出了一个成本效益的计划,将任务仅分配给$ k $工人,并逐渐增加$ k $。我们介绍了组合多臂匪徒模型的使用来了解哪些工人在分配梯度计算时最快。假设具有指数分布的响应时间以不同方式参数的工人,我们会以我们的策略的遗憾(即学习工人的平均响应时间花费的额外时间)提供经验和理论保证。此外,我们提出和分析适用于大量响应时间分布的策略。与自适应$ k $ -sync相比,我们的计划通过相同的计算工作和较小的下行链路通信在速度较低的情况下,误差大大降低。
translated by 谷歌翻译
患者定期继续在其他设施中进行评估或治疗,而不是他们开始他们,以前的成像研究作为CD-ROM,并要求新医院的临床工作人员将这些研究导入其本地数据库。然而,在不同的设施之间,命名法,内容甚至医疗程序之间的标准可能会有所不同,通常需要人为干预,以准确地将所接受的研究在受援所在医院的标准中进行分类。在这项研究中,作者呈现MOMO(模态映射和编排),一种基于深入的学习方法,用于利用元数据子字符串匹配和神经网络集合来自动化该映射过程,培训以识别七种不同的76个最常见的成像研究方式。进行回顾性研究以测量该算法可以提供的准确性。为此,从当地医院的PACS数据库检索了一组11,934个与现有标签的成像系列,以培训神经网络。一套843个完全匿名的外部研究被手动标记为评估我们算法的性能。另外,进行烧蚀研究以测量算法中网络集合的性能影响,并进行商业产品的比较性能测试。与商业产品相比(预测功率为82.86%,精度为82.86%,1.36%的小错误),单独的神经网络集合单独使用较低的准确度进行分类任务(预测功率为99.05%,精度为72.69%,较小误差10.3%)。然而,MOMO以准确性大的裕度优于大幅度,并且具有增加的预测功率(预测功率为99.29%,精度为92.71%,2.63%的小错误)。
translated by 谷歌翻译
Neural compression offers a domain-agnostic approach to creating codecs for lossy or lossless compression via deep generative models. For sequence compression, however, most deep sequence models have costs that scale with the sequence length rather than the sequence complexity. In this work, we instead treat data sequences as observations from an underlying continuous-time process and learn how to efficiently discretize while retaining information about the full sequence. As a consequence of decoupling sequential information from its temporal discretization, our approach allows for greater compression rates and smaller computational complexity. Moreover, the continuous-time approach naturally allows us to decode at different time intervals. We empirically verify our approach on multiple domains involving compression of video and motion capture sequences, showing that our approaches can automatically achieve reductions in bit rates by learning how to discretize.
translated by 谷歌翻译
This paper extends quantile factor analysis to a probabilistic variant that incorporates regularization and computationally efficient variational approximations. By means of synthetic and real data experiments it is established that the proposed estimator can achieve, in many cases, better accuracy than a recently proposed loss-based estimator. We contribute to the literature on measuring uncertainty by extracting new indexes of low, medium and high economic policy uncertainty, using the probabilistic quantile factor methodology. Medium and high indexes have clear contractionary effects, while the low index is benign for the economy, showing that not all manifestations of uncertainty are the same.
translated by 谷歌翻译
We introduce organism networks, which function like a single neural network but are composed of several neural particle networks; while each particle network fulfils the role of a single weight application within the organism network, it is also trained to self-replicate its own weights. As organism networks feature vastly more parameters than simpler architectures, we perform our initial experiments on an arithmetic task as well as on simplified MNIST-dataset classification as a collective. We observe that individual particle networks tend to specialise in either of the tasks and that the ones fully specialised in the secondary task may be dropped from the network without hindering the computational accuracy of the primary task. This leads to the discovery of a novel pruning-strategy for sparse neural networks
translated by 谷歌翻译
Overfitting is a problem in Convolutional Neural Networks (CNN) that causes poor generalization of models on unseen data. To remediate this problem, many new and diverse data augmentation methods (DA) have been proposed to supplement or generate more training data, and thereby increase its quality. In this work, we propose a new data augmentation algorithm: VoronoiPatches (VP). We primarily utilize non-linear recombination of information within an image, fragmenting and occluding small information patches. Unlike other DA methods, VP uses small convex polygon-shaped patches in a random layout to transport information around within an image. Sudden transitions created between patches and the original image can, optionally, be smoothed. In our experiments, VP outperformed current DA methods regarding model variance and overfitting tendencies. We demonstrate data augmentation utilizing non-linear re-combination of information within images, and non-orthogonal shapes and structures improves CNN model robustness on unseen data.
translated by 谷歌翻译
An optimal delivery of arguments is key to persuasion in any debate, both for humans and for AI systems. This requires the use of clear and fluent claims relevant to the given debate. Prior work has studied the automatic assessment of argument quality extensively. Yet, no approach actually improves the quality so far. Our work is the first step towards filling this gap. We propose the task of claim optimization: to rewrite argumentative claims to optimize their delivery. As an initial approach, we first generate a candidate set of optimized claims using a sequence-to-sequence model, such as BART, while taking into account contextual information. Our key idea is then to rerank generated candidates with respect to different quality metrics to find the best optimization. In automatic and human evaluation, we outperform different reranking baselines on an English corpus, improving 60% of all claims (worsening 16% only). Follow-up analyses reveal that, beyond copy editing, our approach often specifies claims with details, whereas it adds less evidence than humans do. Moreover, its capabilities generalize well to other domains, such as instructional texts.
translated by 谷歌翻译